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ABSTRACT 

Background and objectives: Although ethnic group 
variations in cancer exist, no multiethnic, population- 
based, longitudinal studies are available in Europe. Our 
objectives were to examine ethnic variation in all- 
cancer, and lung, colorectal, breast and prostate 
cancers. 

Design, setting, population, measures and 
analysis: This retrospective cohort study of 4.65 
million people linked the 2001 Scottish Census 
(providing ethnic group) to cancer databases. With the 
White Scottish population as reference (value 100), 
directly age standardised rates and ratios (DASR and 
DASRR), and risk ratios, by sex and ethnic group with 
95% CI were calculated for first cancers. In the results 
below, 95% CI around the DASRR excludes 100. Eight 
indicators of socio-economic position were assessed 
as potential confounders across all groups. 
Results: For all cancers the White Scottish population 
(100) had the highest DASRRs, Indians the lowest 
(men 45.9 and women 41.2) and White British (men 
87.6 and women 87.3) and other groups were 
intermediate (eg, Chinese men 57.6). For lung cancer 
the DASRRs for Pakistani men (45.0), and women 
(53.5), were low and for any mixed background men 
high (174.5). For colorectal cancer the DASRRs were 
lowest in Pakistanis (men 32.9 and women 68.9), 
White British (men 82.4 and women 83.7), other White 
(men 77.2 and women 74.9) and Chinese men (42.6). 
Breast cancer in women was low in Pakistanis (62.2), 
Chinese (63.0) and White Irish (84.0). Prostate cancer 
was lowest in Pakistanis (38.7), Indian (62.6) and 
White Irish (85.4). No socio-economic indicator was a 
valid confounding variable across ethnic groups. 
Conclusions: The 'Scottish effect' does not apply 
across ethnic groups for cancer. The findings have 
implications for clinical care, prevention and screening, 
for example, responding appropriately to the known 
low uptake among South Asian populations of bowel 
screening might benefit from modelling of cost- 
effectiveness of screening, given comparatively low 
cancer rates. 



ARTICLE SUMMARY 



Article focus 

■ The Scottish Health and Ethnicity Linkage Study 
examined whether all cancers, and lung, colorec- 
tal, breast and prostate cancer separately, in the 
period 2001-2008, varied by 2001 Scottish 
Census ethnic group categories. 

Key messages 

■ The main public health lesson and challenge is 
for the majority population, for the 'Scottish 
effect' in relation to cancer does not apply across 
Scotland's ethnic groups. 

■ This exemplifies how the study of ethnic varia- 
tions provides a public health approach with 
potential to benefit the entire population. 

Strengths and limitations of this study 

■ The strength of the study is the development of 
a retrospective cohort with high overall linkage 
rates in a national population; the exploration of 
the potential role of socio-economic variables 
and country of birth available in the Census; and 
the linkage of Census data to both cancer regis- 
try and community/hospital mortality data. 

■ The limitations include the small numbers of 
outcomes for some non-White populations, and 
the consequent aggregation of some ethnic 
groups; variation in linkage rates by ethnic 
group; inability to capture events that occur over- 
seas outside the UK and lack of individually link- 
able cancer risk factor data. 



INTRODUCTION 

Cancer is a dominant cause of death in indus- 
trialised countries, and particularly common in 
Scotland 1 Cancer incidence varies hugely 
across countries, between country of birth/ 
ethnic groups and over time, thus clearly indi- 
cating that the causes of cancer are largely 
environmental. Examination of such variations, 
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including by country, by country of birth and when possible 
ethnic group, has proven to be of value both in sparking 
causal research and in assessing disease burden, healthcare 
priorities and patients' needs. 2 

Given international variations, it is not surprising that 
major differences in cancer frequency are demonstrable 
by ethnic group. 2 3 Ethnic group studies on cancer have 
mosdy utilised the proxy indicator country of birth, 
which is usually available in both population registries 
and censuses (supplying denominators) and sometimes 
in cancer and death registration systems. The limitations 
of this proxy have been discussed elsewhere, 4 5 including 
that, especially in European countries with colonial his- 
tories such as Scodand, many of the elderly were born 
abroad, and substantial proportions (often 50% or 
more) of resident ethnic minority populations are not 
born abroad. Name search methods are also popular 6-9 
but have even more limitations, for example, they are 
not good for studying White minority groups and 
African and Caribbean origin Black populations. 4 5 A 
recent survey of European cancer registries concluded 
that while self-reported ethnicity was the exemplary vari- 
able, none of 79 registries analysed data this way, with 
Scotland being closest to achieving this goal. 4 

Within multiethnic countries proper ethnic group 
data are needed to maintain valid surveillance of cancer 
trends and inequalities, to set priorities, to ensure equit- 
ability of service delivery and to further develop hypoth- 
eses on causation. 10 The few studies that use reported 
ethnic group in Europe may have a high proportion 
with missing ethnicity. The best such studies combine 
this with country of birth. 3 8 Such is the scarcity of data 
that a 2007 paper reported on observer-assigned ethnic 
group on 2713 people followed up for 19.9 years, yield- 
ing six cases in South Asian men, and 26 in 
African-Caribbean men. 11 Linkage of cancer registration 
and hospital episode statistics (providing ethnicity) in 
England is demonstrating the importance of this 
approach, despite some limitations, for example, missing 
data. 12-15 Most available studies in Europe analyse data 
at a point or period of time, that is, cross-sectional ana- 
lyses using numerators and denominators from different 
sources, creating potential errors in calculations of 
rates. 3 6 The field is developing internationally with 
recent work using name search methods in Canada ' and 
linkage methods in New Zealand, 17 with interest in 
multination comparisons for specific ethnic groups. 18 

Ethnic variations in cancer, mostly using country of 
birth, 3 have been noted with, for example, comparatively 
lower mortality for all combined and four major cancers 
in South Asian migrants in England and Wales. 3 16 
Studies based on country of birth 16 and ethnicity data in 
England and Wales 12 support the role of environmental 
factors in explaining this variation. There is evidence of 
change over time and across generations, though cancer 
inequalities persist, some narrowing, others widening. 19 

The Scottish Health and Ethnicity Linkage Study com- 
pared all cancers (without non-melanoma skin cancers) , 



and lung, colorectal, breast and prostate cancer separ- 
ately, in the period 2001-2008 by ethnic group cate- 
gories as reported in the 2001 Scottish Census. 20 These 
cancers were chosen as the commonest cancers in 
Scotland, prioritised by national public health strategy. 1 

Scotland has a higher incidence of cancers compared 
to England and Wales and people born in Scotland 
living in England and Wales also have comparatively 
high rates. 16 19 The background information on 
Scotland's health services, cancer data systems, the 
ethnic mix of the population and previous research on 
Scottish populations by ethnic group has been sum- 
marised recently by Arnold and Brewster 4 (Ch 4.4). Data 
on cancer by ethnic group in Scotland are old, limited 
in scope and from small-scale studies 2 9 21 22 focusing 
solely on Chinese, South Asians and Italians and pub- 
lished in the 1980s and early 1990s. These studies are 
summarised in box 1. 

This paper reports new, more comprehensive data 
from Scotland using a national, retrospective cohort 
study. It also includes both an examination of the poten- 
tial for adjusting for socio-economic confounding and 
studying the effects of country of birth in relation to 
ethnicity. Finally, using risk factor data from Health 
Survey for England and Scottish Health Survey we inter- 
pret our results indirectly (in the absence of linkable 
risk factor data) . 

METHODS 

The methods of our retrospective cohort study are pub- 
lished, and key details on linkage are also given in 
appendix l. 20 23 We followed a strict protocol that pre- 
served anonymity and maintained separation of personal 
data from the Census and NHS, and clinical data (see 
also ethics below). We used computerised matching of 



Box 1 Brief overview of Scottish studies on ethnic varia- 
tions in cancer 



► Muir reported that Harkness (1993, unpublished) examined 
nasopharyngeal cancers in Scotland, identifying Chinese 
people by name recorded on the cancer register. The age stan- 
dardised rate was 0.3/100 000 in the entire Scottish popula- 
tion, and 13.7 in people with Chinese names. 2 

► Black found substantial differences between Italian-born resi- 
dents and the Scottish population in laryngeal and stomach 
cancer (higher in men) and lung cancer (lower in men and 
women) 21 

► Merchant et al identified Indian and Pakistani men by name in 
the cancer registry and compared cancer rates to those of the 
Bombay cancer registry and the whole Scottish population. 22 
Oral cancer in Scottish Indians/Pakistanis was intermediate 
between the Bombay and Scotland rates. Similar observations 
were made for lung cancer in men and breast cancer in women. 

► Matheson et al found cancers between 1961 and 1981 in 
South Asian adults by name search in the West of Scotland, 
reporting comparatively low rates of colorectal, breast and 
bronchial cancer, but high rates of cervical cancer. 9 
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names, addresses and dates of birth to link the Census 
2001 for Scodand, which provided ethnic group as 
reported by either individuals or the householder com- 
pleting the form based on a question followed by a 
choice of 14 categories (appendix 1, table Al, which 
also provides linkage by ethnic group) , and other demo- 
graphic and socio-economic variables, to the Scottish 
Community Health Index (CHI), which is a register of 
patients using the NHS. We then matched, using CHI 
number, to an already linked death in the community 
and hospital, and cancer registration records (SMR06) 
database. 

Ethnic group is a legally required field that was well 
completed (95.8%) and, after imputation (4.3%), avail- 
able for 100% of those completing the census form 
(which is also a legal obligation) . (For details see: http:// 
www.gro-scotland.gov.uk/census/censushm/censcr02/ 
data-quality/census-variables/results-and-conclusions/ 
appendix-d-person-items-reports-and-tables-plO-to-p-17. 
html; accessed 26 April 2012). About 95% of the people 
participating in the 2001 census (4.9 million) were linked 
as above to health records, that is 4.65 million, with 85% 
or more linked in every ethnic group 20 (see appendix 1). 
The total estimated Scottish population was 5.06 million so 
our cohort of 4.65 million includes about 92% of the 2001 
population. While the identities of those not completing a 
census form are unknown; it is estimated in census validity 
studies that a higher proportion of non-White than White 
groups were non-completers — estimated at, for example, 
10.2% of Pakistanis and 3.8% of White Scottish. 

The ethnic group categories (and labels) follow those 
of the Scottish Census 2011, given in appendix l. 20 
Because of small numbers we grouped Bangladeshis 
with other South Asians; and Caribbean, African and 
Black Scottish or other Black, into one 'African origin' 
group. Further grouping was sometimes necessary 
because of small numbers in analysis of specific cancers 
as described in the results. Mostly, following our analyt- 
ical strategy, ethnic groups were sometimes omitted to 
avoid potential disclosure of identity. 

About 90% of the cases were obtained from the cancer 
registry, 10% from mortality files. Cancers are registered at 
diagnosis, so mortality data add cases where the diagnosis 
was first made outside Scotland, which is especially import- 
ant for mobile ethnic minority groups. A date of embark- 
ation field is in the registry but we did not think this was 
reliable enough in relation to non-UK migration to use to 
adjust denominators. More than 90% of the Scottish 
Cancer Registry records for 2001-2008 were linked to our 
census-extract file. We excluded non-melanoma skin 
cancer. The ICD codes used are in box 2. Other non- 
cancer health outcomes were excluded from the analysis 
file for reasons given in the ethics section below. 

To minimise the numbers of age/sex cells with no 
cases, which creates instability in the analysis, we 
restricted analysis by age as follows: >20 years for all 
cancer; >30 years for lung cancer; >20 years for breast 
cancer; and >30 years for colorectal and >40 years for 



Box 2 ICD codes used in the study 



Up to 31 December 1996 ICD9 codes were used by the Cancer 
Registry (needed for 10 year look-back) 

Lung cancer ICD9 162 

Breast cancer ICD9 174 

Prostate cancer ICD9 185 

Colorectal cancer I CD9 153-154 

All cancers ICD9 140-208 

All cancer without ICD9 140-172 and 174-208; 

non-melanona skin cancers 
From 1 January 1997 in Cancer Registry and from 1 January 
2000 in mortality data ICD10 codes were used 

Lung cancer ICD10 C33-C34 

Breast cancer ICD10C50 

Prostate cancer ICD10C61 

Colorectal cancer ICD10 C18-C21 

All cancers ICD10 C00-C97* 

All cancer without ICD10 C00-C43 and C45-C97 

non-melanona skin cancers 

*C97 is multiple cancer sites— used in mortality data only. 



prostate cancer. This led to few omissions, ranging from 
0.1% to 1.9% depending on the specific diagnosis. 

We analysed only first events, that is, newly diagnosed 
cancers occurring between 2001 and 2008. First event 
meant that there was no record of the cancer diagnosis 
under study in the preceding 10 years in the mortality and 
cancer registration (SMR06) linked file. The cancer regis- 
try collects data from a range of sources including path- 
ology laboratories, so our cases are likely to be new ones. 

We calculated for first cancers for all and each cause, 
by sex: directly age standardised cumulative incidence 
rates (DASRs) per 100 000/year using 10-year age 
groups; DASR ratios (DASRRs) ; risk ratios (RRs) using 
Poisson regression with robust variance adjusting for age 
and country of birth; and 95% CI around summary mea- 
sures. To assess effects of out-migration we calculated RR 
using moving average for 3-year time periods 2001-2004, 
2002-2005, etc. In appendix 2, we provide details of our 
approach in calculating rates and RRs, including details 
of the Poisson modelling. The standard reference popu- 
lation was the White Scottish population. For ease of 
interpretation we multiplied ratios by 100 to get whole 
numbers interpretable as percentages. We adjusted the 
RRs for country of birth being Scotland or outside 
Scotland. Relatively few cases in ethnic minority popula- 
tions were born in Scotland, for example, for all cancers 
excepting non-melanoma, the proportion was 5.1% in 
other White British, 11.2% in Indians, 18.5% in Pakistani, 
8% of Chinese and 36% of African origin groups. In the 
small any mixed background group 64.7% were born in 
Scotland. For this reason, that is, statistical precision, ana- 
lysis is not stratified by country of birth. 

We examined, in each ethnic group, whether there was 
an association between eight indicators of socioeconomic 
position and all cancer rates (at all ages) and hence 
whether any were potentially valid confounding factors 
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across all our ethnic groups. The indicators were: (1) the 
postcode (zipcode)-based Scottish Index of Multiple 
Deprivation, (2) car ownership, (3) highest qualification 
of the individual, (4) highest qualification in the house- 
hold, (5) National Statistics Socio-economic Classification 
at individual, and (6) household levels, (7) household 
tenure and (8) economic activity in the previous week (of 
the Census completion date). 

Data were analysed using SAS V.9 (SAS Institute Inc, 
Gary North Carolina, USA) and Stata 11 (StataCorporation 
2009; Statistical Software: Release V.11.0; College Station, 
Texas, USA). 

In the Results section we provide both absolute (DASRs) 
and ratio (DASRRs and RRs) measures and describe find- 
ings where the 95% CI does not include 100, the value for 
the reference White Scottish population. 

ETHICS AND DISCLOSURE 

The work was approved by the Multicentre Research 
Ethics Committee for Scotland and the Privacy Advisory 
Committee of NHS National Services Scodand. The 
ethical and other permissions and related issues have 
been reported in detail, 20 23 including an independent 
assessment by an ethicist. 24 To comply with the Data 
Protection Act and safe-setting rules the data set only 
contained cancer outcomes. Other outcomes were 
excluded to minimise risks of inadvertent disclosure of 
identity. The analysis was conducted on a standalone 
computer in a locked room in the General Register 
Office for Scodand (GROS), now known as National 
Records Scodand, by named researchers (NB, MS,GB — 
see contributors), following a strict disclosure protocol. 
Outputs leaving the safe setting (including this paper) 
were screened by a GROS disclosure committee. 

RESULTS 

All cancers without non-melanoma skin cancer 

Table 1 and figure 1 show that in men and women, with 
the exception of men in the any mixed background 
group (where the 95% CI included the reference value), 
the White Scottish population had the highest rates and 
ratios of cancer (DASRR of 100 by definition), above 
even other White groups. The rates (and DASRRs) were 
particularly low in Indian (45.9 in men and 41.2 in 
women) , Pakistani (49.3 in men and 65.0 in women) 
and Chinese (57.6 in men) populations. Including 
country of birth as a covariate, as shown by comparing 
the age-adjusted and age and country of birth-adjusted 
RRs (table 1), only slighdy altered these patterns, 
though in this analysis 95% CIs were more likely to 
include the reference value. Generally, this adjustment 
closed the gap slighdy between the reference and each 
comparison population. 

As shown in appendix 3 and table A2, except for the 
African origin group, and other South Asian women, RRs 
were similar in the time period 2001-2004, 2002-2005, 
2003-2006, 2004-2007 and 2005-2008, indicating that, 



with the few exceptions above, unmeasured, differential 
emigration was not underlying these ethnic variations. 

Lung cancer 

Table 2 and figure 2 show that with the exception of the 
White Irish (similar), and any mixed background men 
(higher), all other ethnic groups had lower lung cancer 
standardised rates (and ratios) than the White Scottish 
population. The low DASRR for Pakistani men (45.0) and 
Chinese men (63.1) and a high DASRR for any mixed 
background men (174.5) were notable. The DASRs show 
that, in every group except for Chinese, men had much 
higher rates of lung cancer than women had. 

Including country of birth as a covariate raised the 
RRs in every ethnic group, indicating Scottish-born 
people in these ethnic groups are at higher risk of lung 
cancer than those born abroad. 

Colorectal cancer 

Table 3 and figure 3 show large differences by ethnic 
group, with the highest DASRs for colorectal cancer in 
White Scottish and Irish men. Pakistani men 
(DASRR=32.9) and women (68.9) and Chinese men 
(42.6) had very low ratios with other White British (82.4 
in men and 83.7 in women) and other White (77.2 in 
men, 74.9 in women) groups being intermediate. (Data 
for Indians are omitted for risks of disclosure reasons, 
but the results have been examined and the pattern is 
similar to that in Pakistanis) . 

Including country of birth as a covariate made little 
difference to the patterns observed, for example, the RR 
in Pakistani men changed from 45.6 to 46.4. 

Breast cancer in women 

Table 4 and figure 4 show large ethnic variations (but, for 
once, no advantage to the other White British popula- 
tion). White Irish populations (84.0) had lower DASRRs 
than the White Scottish population but DASRRs were 
especially low for Pakistani (62.2) and Chinese (63.0) 
populations. For Indian (86.5) and other South Asian 
(88.2) groups the rate ratios were closer to the reference 
value and the 95% CI included this. Adjustment for 
country of birth hardly altered the results. 

Prostate cancer 

Table 5 and figure 5 show large ethnic differences in 
prostate cancer, with DASRRs as low as 38.7 in the 
Pakistani group, and considerably lower than in Indians 
(62.6). The other White British group (111.8) had a 
higher DASRR for prostate cancer than the White 
Scottish reference, while the White Irish (85.4) had a 
lower one. The African origin population had a high 
DASRR (138.1) but the 95% CIs included 100. (Moving 
average analysis showed little variation across time 
periods, but the data were not released because of risks 
of disclosure.) Adjustment for country of birth attenu- 
ated the risk difference in other White British, but 
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Table 1 All first cancer-excluding non-melanoma skin cancer occurring between May 2001 and April 2008: directly age standardised annual rates per 100 000 population/ 
year by ethnic group and sex, and related rate ratios, and age and country of birth-adjusted risk ratios (Poisson regression), with corresponding 95% CIs 
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89.0 
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87.1 to 137.3 
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43.8 
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64.3 


58.1 to 71.1 


67.6 


59.4 to 76.9 
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21 


2510 
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White Scottish 
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1643684 
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660.2 to 669.3 


100.0 
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Other White British 


5855 


151335 


580.0 


565.7 to 594.4 
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90.7 to 99.1 
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65 
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4054 
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183.1 to 365.3 
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48.4 
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41 
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Figure 1 Any cancer age standardised rate ratio by ethnic 
group. 



across the Other ethnic groups the RRs were lowered 
suggesting that being born in Scotland was protective. 

Socio-economic factors 

Appendix 4 (tables A3 and A4) shows the relationship 
between eight socio-economic variables and all cancers 
(all ages) by ethnic group. There was inconsistency in the 
relationships with no variable being consistently asso- 
ciated in the same direction with cancer in each ethnic 
group. These variables, therefore, did not meet the 
requirement of a confounding variable for our purposes. 

DISCUSSION 
Principal findings 

To our knowledge this is the first-reported European 
census-to-cancer data linkage exploring ethnic varia- 
tions, though similar work has been done without the 
ethnicity angle in Iceland. 25 Developing the method is, 
therefore, a key result. While disaggregating White sub- 
populations has been recommended, 26 examples are 
rare, even though country of birth work in England 
shows substantially higher all-cause, cardiovascular and 
cancer death rates in Ireland-born and Scodand-born 
residents. 15 Even recent incidence studies have omitted 
this opportunity. 12 13 The observation that the White 
Scottish population, except for breast cancer and pros- 
tate cancer in the other White British, generally have 
higher rates than other ethnic groups in the same envir- 
onment, further emphasises the challenge in Scotland. 1 
Differences in cancer rates between many non-White 
and White populations have been demonstrated previ- 
ously 3 including in Scotland. 2 3 9 21 Our advances here 
are to provide (retrospective) cohort data; to use the 
recommended measure of reported and not observer- 
assigned ethnic group; to provide data by a broad range 
of ethnic groups including White subgroups; to examine 
the associations with socio-economic factors to assess val- 
idity of potential confounding factors; to include 



country of birth in analyses; and provide updated data 
on a national scale. 

The results have clinical and public health repercus- 
sions. For example, there is concern about low uptake of 
cancer screening services by South Asians. 28 29 Breast 
cancer screening services need to achieve greater ethnic 
equity, 30 especially as breast cancer mortality seems to be 
converging towards the historically high rates in the 
UK 19 and ethnic minority women seem to be presenting 
with a comparatively high proportion of late-stage 
disease. 31 However, before implementing new interven- 
tions to raise the rate of colorectal cancer screening, 
given the low relative rates of this cancer and that rapid 
convergence is not evident, for example, in 
Pakistan-born people 9 — we might wish to review the cost- 
effectiveness of screening in such ethnic groups first. 

Strengths and limitations of the study 

Retrospective cohort studies have the advantage of being 
low cost and fast in delivering results and, unlike case- 
control studies, provide incidence rates. 32 The strength 
of the study is the development of new methods creating 
a retrospective cohort; high overall linkage rates (95%); 
a large national population (4.65 million people); the 
availability of reported ethnic data on a wide range of 
ethnic groups; a check on whether differential emigra- 
tion by ethnic groups might be creating spurious differ- 
ences by analysis over time using moving averages; the 
exploration of the potential role of socio-economic vari- 
ables and country of birth available in the Census; and 
the linkage of Census data to both cancer registry and 
community/hospital mortality data, so differences in 
rates do not simply reflect varying entry by ethnic group 
to the health system. 

Audits show high completeness and quality of the 
SMR06 file for cancer diagnoses though such statistics by 
ethnic group are not available. 33 34 All deaths are certi- 
fied by a doctor in Scotland and all hospitals are 
required to submit cancer registration data. 

The validity of available indicators of socio-economic 
position, particularly area-based ones derived from post- 
code and census data, is not established in multiethnic 
studies, yet they are usually used in cancer research. 13 35 
Harding's study of mortality including cancer is a rare 
example of using other indicators. 36 We tested eight 
indicators and found that none were consistently asso- 
ciated in the same direction with the outcome (cancer) 
and hence none were valid confounding variables suit- 
able for across-ethnic group comparisons. The recom- 
mendation that studies of ethnic and racial variations 
adjust for socio-economic variables is sound but is not 
readily achievable as using invalid variables will generate 
spurious results. 

Convergence of rates across generations is the pre- 
dicted pattern. 2 A recent review indicated that conver- 
gence was cancer site-specific and occurring slower than 
expected in Europe. 3 We explored this using the 
country of birth variable in the Census and found this 
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Table 2 Lung cancer: directly age standardised annual rates per 100 000 population/year by ethnic group and sex, and related rate ratios, and age and country of 
birth-adjusted risk ratios (Poisson regression), with corresponding 95% CIs 



Directly standardised 
rate 



Age standardised rate 
ratio (as %) 



Age adjusted risk ratio 



Age and country of birth 
adjusted risk ratio 



Ethnic group* 


Events (n) 


Population 


Rate 


95% CI 




Rate ratio 


95% CI 


Risk ratio 


95% CI 


Risk ratio 


95% CI 




Men 
White Scottish 


15155 


1212648 


178.5 


175.7 to 


181.3 


100.0 




100.0 




100.0 






Other White British 


983 


116075 


124.5 


116.7 to 


132.2 


69.7 


65.2 to 74.2 


70.4 


63.4 to 78.2 


84.3 


74.5 to 


95.4 


White Irish 


211 


15453 


174.6 


151.3 to 


197.9 


97.8 


84.6 to 110.9 


99.7 


90.9 to 109.3 


114.8 


99.9 to 


131.9 


Other White 


151 


17335 


141.5 


118.5 to 


164.6 


79.3 


66.3 to 92.2 


80.7 


74.2 to 87.7 


94.2 


82.4 to 


107.6 


Any mixed background 


21 


1400 


311.6 


184.1 to 


439.0 


174.5 


103.1 to 245.9 


172.3 


100.5 to 295.5 


184.9 


114.3 to 


299.3 


Pakistani 


18 


5353 


80.3 


37.4 to 


123.2 


45.0 


21.0 to 69.0 


48.1 


33.7 to 68.7 


57.7 


37.6 to 


88.6 


Chinese 


15 


3004 


112.6 


53.7 to 


171.6 


63.1 


30.1 to 96.1 


68.3 


49.7 to 93.9 


81.5 


55.3 to 


120.3 


Women 


























White Scottish 


12996 


1408621 


131.8 


129.6 to 


134.0 


100.0 




100.0 




100.0 






Other White British 


626 


127254 


74.7 


68.8 to 


80.5 


56.7 


52.1 to 61.2 


57.3 


52.3 to 62.7 


79.9 


72.0 to 


88.7 


White Irish 


177 


17924 


119.2 


101.7 to 


136.7 


90.4 


77.0 to 103.8 


91.7 


75.9 to 110.7 


120.8 


94.5 to 


154.5 


Other White 


99 


21210 


81.0 


64.8 to 


97.1 


61.4 


49.1 to 73.7 


62.4 


50.1 to 77.6 


83.5 


66.5 to 


105.0 


Any mixed background 


10 


1849 


115.5 


43.5 to 


187.5 


87.6 


33.0 to 142.3 


86.3 


62.7 to 118.8 


97.6 


62.9 to 


151.4 


Pakistani 


8 


4963 


70.6 


0.0 to 


149.6 


53.5 


0.0 to 113.5 


37.7 


21 .3 to 66.8 


52.7 


29.5 to 


94.2 


Chinese 


15 


3250 


127.3 


50.7 to 


203.9 


96.6 


38.4 to 154.7 


93.5 


64.6 to 135.3 


130.7 


95.7 to 


178.4 



'Indian, other South Asian, African origin and other ethnic groups numbers were small and judged to be potentially disclosive. 



Cancer by ethnic group in Scotland 



RR men 
RR women 
95% CI 
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Figure 2 Lung cancer age standardised rate ratio by ethnic 
group. 

pattern was only evident for lung cancer. We acknow- 
ledge that this may change as more cases occur in 
Scodand-born ethnic minority populations. In future as 
those born in Scotland increase in age, examining 
cancer by ethnic group stratified by country of birth will 
be important. These data break new ground in Europe, 
both in terms of findings and in linkage methods. 3 

The limitations of the study include the small 
numbers of outcomes for some non-White populations, 
and the consequent aggregation of some ethnic groups, 
though the numbers are large compared with a recent 
paper. 11 The result is imprecision of estimates and insuf- 
ficient numbers to examine survival as others have 
done. We had some variation in linkage rates by ethnic 
group (ranging from 85.1% in other South Asian to 
95.3% in White Scottish) but the potential bias is 
unknown. We think such bias would be small as the vari- 
ation in linkage is most probably due to random causes, 
for example, variations in the spelling of unfamiliar 
names or misrecording of date of birth in NHS data- 
bases. Similarly, there may have been differences in 
response rates by ethnic group in the census but the 
potential bias cannot be assessed for lack of data on 
non-responders. Inability to capture events that occur 
overseas outside the UK is a problem that is not easily 
resolved. Deaths of UK residents are reported back via 
several channels, including embassies and consulates, 
and the primary care registration systems. Such reports, 
however, may not give an accurate cause of death. 
'Salmon bias', whereby sick people return to countries 
of origin to die or for treatment, is potentially important 
but we think it unlikely in Scotland, and not a central 
issue for this analysis. First, in contrast to cancers, we 
find high rates of cardiovascular disorders, including 
chronic ones such as heart failure, in South Asian popu- 
lations. 38 A 'salmon bias' is not likely to be specific to 
cancer but to life-threatening chronic illness. NHS 
Scotland provides excellent services free at the point of 
use so that cancer patients are likely to stay not as 
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Cancer by ethnic group in Scotland 



RR men 
RR women 
95% CI 



Ethnic Group 



Figure 3 Colorectal cancer age standardised rate ratio by 
ethnic group. 



emigrate. Finally, 90% of our events are incident cases, 
not mortality, and the bias applies to mortality data. 
Denominator bias would arise from differential migra- 
tion by ethnic group. If this occurred then rate ratios 
would alter over time. Appendix 3 and table A2 show 
that this did not happen for most ethnic groups for all 
cancers. 

The greatest limitation of retrospective cohort studies 
is inability to specify which confounding variables and 
risk factors are to be studied, and also to control the 
quality and completeness of outcome data. 32 In our case 
the census gave access to a wide range of relevant expos- 
ure and potential confounding variables. The outcome 
data are of high quality and completeness in Scotland. 
The lack of cancer risk factor data in our retrospective 
cohort is a limitation, as in many studies of this design. 
We have no specific risk factor data to explore hypoth- 
eses though we are starting a pilot project reporting in 
2013 on linking risk factor data held in primary care to 
our data but even if successful we do not envisage 
having such data till about 2015. In the meantime, we 
have used data from national health surveys 39 40 to help 
interpret the cancer patterns (table 6) as discussed 
below. 



Findings in relation to the literature 

The Scottish context 

Scodand has high cancer rates, probably reflecting his- 
torically high exposure to causal factors such as 
smoking, and a diet high in processed foods and low in 
fruit and vegetables. 1 41 These factors combine with 
comparatively poor socio-economic status, in ways that 
are not properly understood. It is of both scientific and 
public health significance that people of other ethnic 
groups in Scodand do not share White Scottish resi- 
dents' propensity to cancer. This applies to both White 
and non-White subgroups alike, though particularly the 
latter. Other White British in Scodand, predominandy 
English, have lower rates of a range of problems 
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Figure 4 Breast cancer age standardised rate ratio by ethnic 
group. 

(including all cancer, but not breast or prostate cancer 
in these results). Similar results were found for those 
born in England and Wales and living in Scodand, and 
those born in Scodand and living in England and Wales, 
for example, lower cancer mortality 16 and all-cause mor- 
tality and cardiovascular, 42 3 and alcohol-related mortal- 
ity 44 45 in England and Wales born. These differences 
are probably linked to the higher socio-economic status 
and lower exposure to causal factors of these other 
White British ( predominantly English) populations com- 
pared to the White Scottish group. This is a less likely 
explanation for White Scottish people having (mostly) 
higher cancer rates than White Irish, and other White 
groups. Examination of White subgroups in epidemi- 
ology is uncommon. Given the potential interest demon- 
strated here more work is warranted especially in the 
acquisition of risk factor data that are integral to the 
cohort analysis. 

The main non-White populations of Scotland are 
Pakistani, Indian and Chinese. They are well established, 
with about half of the population born in the UK. 46 4 
The main Indian, Pakistani and Chinese population 
migrations to Scotland occurred in the mid-1950s 
through the 1970s. People from these ethnic groups born 
abroad have lived on average in Scotland for several 
decades although exact data are not available. In 2001, 
about half of these three ethnic populations lived in the 
West of Scotland in Greater Glasgow and Lanarkshire 
health board areas (http://www.scotpho.org.uk/ 
downloads/ethnic_pop_by_hb.xls, accessed 26 April 
2012) comprising some of the most socio-economically 
deprived areas in Western Europe, known for their high 
death rates for chronic diseases, including cancer. 
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Risk factors and socioeconomic status 

The socio-economic status of Indian, Pakistani and 
Chinese populations in Scotland is hard to assess, as on 
some indicators they are better, for example, housing 
tenure, on others they are worse, for example, 
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Figure 5 Prostate cancer age standardised rate ratio by 
ethnic group. 



employment status. 49 Overall, Indians, Pakistanis and 
White Scottish populations seem to be similar and 
Chinese slightly worse off. South Asian populations have 
higher cardiovascular disease (CVD) rates 23 and higher 
rates of diabetes than the White Scottish population 50 
and given that CVD and cancer share risk factors, and 
diabetes may raise cancer risk, there is no prior reason to 
expect cancer rates to be low in these populations in 
Scotland, especially in those born, or long-settled, in 
Scotland. Notwithstanding previous work elsewhere, 3 
and Scotland 9 it is a surprise, therefore, to find that all 
cancers and some common cancers are still, decades 
after Matheson et af and Merchant et reported, sub- 
stantially less common in non-White populations, espe- 
cially in South Asians. Unlike much previous research 
using country of birth and deaths data, where wariness 
about data artefacts, particularly numerator and denom- 
inator mismatch bias, 3 16 cautions against accepting 
large variations as correct, ' this linked cohort analysis 
indicates that differences are possibly even larger then 
reported hitherto using proxy measures of ethnicity. 3 16 19 
Reduction in the strength of the association is a typical 
outcome of non-differential (non-systematic) mismeas- 
urement error so the increased variations are in line 
with epidemiological principles. 

Using the Health Surveys for England 39 40 and the 
Scottish Health Survey, table 6 summarises the best 
available data on some major cancer risk factors, as 
identified by Cancer Research UK (http://info. 
cancerresearchuk.org/healthyliving/; accessed 26 April 
2012). The Scottish population data were collected separ- 
ately using very similar methods to those in the Health 
Survey for England, except for the red meat question. 
Except for physical activity, which may be a reporting 
artefact, the White Scottish population has the highest, 
or among the highest, prevalence of all nine risk factors, 
with the non-White populations, especially women, 
having the lowest prevalences. These patterns are in 
alignment with the results on all cancer (table 1 and 



figure 1). The Scottish Health and Lifestyle Surveys have 
very small numbers of people from these populations so 
Scottish data have not been published by ethnic group. 51 
While little is known about the risk factor profile of 
ethnic minority groups in Scotland, some data are avail- 
able for Glasgow, the home to a high proportion of 
Scotland's non- White population, where questionnaire- 
based health and lifestyle surveys have been done. 52 
These Glasgow data lend support to findings from the 
Health Survey for England in table 6 40 , for example, 
smoking is uncommon in South Asian women and in 
Indian men but common in Pakistani men; drinking 
alcohol is uncommon in South Asian women and 
Pakistani men (mostly Muslim) but not so in Indian men; 
and the diet is a mix of traditional and Scottish foods 
with high fat content, 53 at par with local populations. 
While substantial numbers of Indians are vegetarians, or 
occasional eaters of meat, Pakistani populations are not, 
with red meat (particularly lamb) being a key dietary 

54 55 

component' 

Implications for research, public health and clinical practice 

More fundamental research is required to explain ethnic 
variations. This requires basic science cancer researchers 
to join forces with epidemiologists, so hypotheses can be 
both generated and tested in multidisciplinary research 
groups. In practical terms, we propose that a research 
unit for the focused study of ethnic variations in cancer 
be set up. In such a research environment, for example, 
hypotheses for the differences in colorectal cancer risk 
could be systematically tested, rather than the current ad 
hoc approach, where interesting observations are made 
but not studied in depth, a problem exemplified in the 
UK since at least 1984.'' A full discussion of biomedical 
hypotheses is beyond the scope of this paper but we con- 
sider in a little detail colorectal cancer, and very briefly 
the other three specific cancers, in relation to risk factors 
to illustrate the potential. 

The well-known 'deficit' of colorectal cancer in South 
Asian populations has led to interest in dietary compo- 
nents, especially spices such as curcumin (a component 
of turmeric) and capsaicin, 57 fibre and other complex 
carbohydrates influencing bile acid metabolism and 
bowel flora, as protective agents. 18 8-60 This line of rea- 
soning assumes a protective agent in South Asian popu- 
lations. An alternative, perhaps more promising line, is 
to assume less exposure to carcinogenic agents in the 
South Asian lifestyle. Meat, particularly red meat, is a 
postulated source of such carcinogens, 61 62 yet Pakistani 
populations are keen red meat consumers (see table 6). 
It may be that processing agents for meat are more 
important than the meat itself as indicated, especially, in 
the earlier 1 ' 2 of recent systematic reviews 61 and also 
recently suggested for cardiovascular risk. 63 It is possible 
that the Pakistani diet contains less processed meat. 
Health Survey for England data, unfortunately, combine 
all red meats (table 6). Unpublished data on the diet of 
infants and very young children in Bradford indicates 
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Table 6 Pattern* of smoking, alcohol, physical activity, fruit/vegetables and meat eating, hormone replacement therapy and obesity/central obesity by six ethnic group from Health Surveys for 
England (1999 and 2004) and Scottish Health Survey (2003 and 1999) 



Ethnic group 



Nfor 

current 

smoking 

(varies for 

each 

variable) 



Percentage 
currently 
smoking 
cigarettes 
(1 6 years+) 



Percentage 
of not 
current 
alcohol 
drinker 



Percentage of 
meeting 
physical 
activity 

guidelines! 16 
+ years in HSE 
16-74 years in 
Scotland 



Percentage of 
consuming 5 
or more 
portions of 
fruit/ 

vegetables/ 
day 



Percentage of 
eats red meat 
2+ times/ 
week* (HSE 
1-6 times/ 
week) 



Percentage 
of eats meat 
products 2+ 



Percentage 
of overweight 
(BMI>25) or 
obese 



Percentage 
of raised 
waist/hip 



Percentage of 
ever used 
hormone 
replacement 
therapy (HSE 
1999, 16 years 



times/week^ (BMI>30) 



ratio (>0.95) +) 



Men (16 years or more) 



Scottish 
population 
(predominantly 
White Scottish) 

General 
population 
(predominantly 
White English) 

White Irish 

Indian 

Pakistani 

Chinese 

Black-African 

Black-Caribbean 



3582 



2855 



496 
547 
423 
345 
379 
403 



Women (1 6 years or more) 



Scottish 
population 
(predominantly 
White Scottish) 

General 
population 
(predominantly 
White English) 



4514 



3805 



29 



24 



30 
20 
29 
21 
21 
25 

28 



23 



10 
33 
89 
19 
32 
15 

13 



14 



44 



37 



39 
30 
28 
30 
35 
37 

33 



25 



20 



23 



26 
37 
33 
36 
31 
32 

22 



27 



38 



79 
45 
64 
80 

68 
56 



21 



65 



67 



67 
53 
55 
37 
62 
67 

60 



57 



29 



33 



36 
38 
36 
17 
16 
25 

37 



30 



17§ 



18 



White Irish 


653 


26 


11 


29 


32 


67 


58 


37 


19 


Indian 


547 


5 


59 


23 


36 


34 


55 


30 


7 


Pakistani 


423 


5 


95 


14 


32 


62 


62 


39 


5 


Chinese 


345 


8 


33 


17 


42 


72 


25 


32 


8 


Black-African 


379 


10 


45 


29 


32 




70 


32 




Black-Caribbean 


403 


24 


21 


31 


31 


61 


65 


37 


8 



'Comparative data for predominantly White (general) population are from the 2003 Health Survey for England (HSE). 
t30 min or more moderate to vigorous activity on 5 days/week or more. 

JThese data are from HSE 1999 as they were not published in HSE 2004. HSE equivalent question is 1-6 times/week. 
§1998 Scottish Health Survey 25-74 years, not 16years+ as in HSE. 
BMI, body mass index. 
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that this is correct — processed meats were a common 
reported component in White English Bradford infants, 
but not in Pakistanis (data examined by Raj S Bhopal as 
co-investigator of the Born in Bradford study, communi- 
cation of findings with permission from John Wright, PI 
of Born in Bradford Project). South Asians are also less 
likely to smoke heavily and smoking has been associated 
with both colonic and rectal cancer in the Whitehall 
one cohort study. 64 In terms of well established associa- 
tions for colorectal cancer (table 6) the picture is less 
clear — South Asians report eating more fruit and vegeta- 
bles and have a lower body mass index, which are pro- 
tective, but have higher waist/hip ratio and central 
obesity and lower physical activity which are risks. These 
small and inconsistent variations do not reconcile with 
the major differences in disease outcomes. These kinds 
of hypotheses, which may explain the sustained low rates 
of colorectal cancer in some ethnic minority popula- 
tions, need detailed study. 

The challenges for public health include maintaining 
the low rates of cancer in non-White population while 
reducing them in White populations. This is one example 
where the general goal of narrowing inequalities needs 
careful specification of the change needed. 65 In all likeli- 
hood, given the anticipated tendency to convergence of 
disease risks in migrant populations, 3 cancer rates will rise 
in non-White ethnic minority groups in Scodand, so redu- 
cing inequalities but worsening public health. The greater 
challenge is to reduce inequalities by finding strategies 
that encourage convergence of the majority White popula- 
tions to the low rates in the non-White groups. Already, 
however, it may be too late for breast cancer 12 but fortu- 
nately not for many other cancers. 3 19 

Given the likely lower level of reproductive risk factors 
(early menarche, late first child, small family and no 
breast feeding) and of smoking we expected substan- 
tially less breast cancer in non-White women, especially 
South Asians, but the rates were similar in Indians and 
other South Asians although still substantially lower in 
Pakistanis. The pattern for breast cancer in Pakistanis 
accords with historically relatively early marriage, chil- 
dren and breast feeding — all more common than in the 
White Scottish. Unpublished recent data from Scodand 
indicate only small ethnic differences in age at first 
birth, but substantially more breast feeding in all 
non-White groups, especially South Asians (personal 
observation as PI using Scottish Health and Ethnicity 
Linkage Study maternity data, paper in preparation). 
Table 6 shows non-White women were far less likely to 
report taking hormone replacement therapy than White 
groups. While non-White groups had less overweight/ 
obesity than White Scottish women, with litde difference 
in waist/hip ratio, they are generally more adipose, a 
phenomenon known to be present at a very young age, 
as reflected in skinfold thickness and direct measure of 
fat in children. 66 We found no evidence for an excess of 
breast cancer in African origin populations, as reported 



with contested data in England, 



although our 



numbers are very small. The comparatively low uptake 
of breast cancer screening in ethnic minority popula- 
tions requires urgent action, 29 including in Scodand 
where we have corroborated the findings in England. 30 
Breast cancer screening leads to earlier diagnosis and 
reduced case death so increased participation may lead 
to convergence of incidence rates but better outcome. 

Screening rates for colorectal cancer are also low in 
South Asians, 29 although Scottish data are not yet avail- 
able by ethnic group. Since screening leads to both 
reduced incidence (removal of polyps and premalignant 
lesions) and early diagnosis we would expect even lower 
rates of colorectal cancer if South Asians participated 
equally in this service. The effectiveness/cost- 
effectiveness data on which colorectal screening is 
based, although solid, 70 are probably not applicable to 
populations with both low rates of colorectal cancer and 
low participation, such as Pakistanis. Modelling of cost- 
effectiveness may help to decide how to proceed, espe- 
cially on the urgency of implementing new interventions 
to raise colorectal screening rates in South Asians. 29 

Given the primary cause of lung cancer is tobacco, 
and tobacco smoking is relatively uncommon in South 
Asians, particularly women and Indians, though not so 
for Pakistani and Bangladeshi men 40 71 the low rates are 
in line with cigarette-smoking patterns (table 6). We 
note the high prevalence of lung cancer in mixed popu- 
lation men but such data need corroboration. Pakistani 
men have the same high prevalence of current smoking 
as White Scottish men but the amount smoked is lower, 
which together with the fact that lifetime exposure to 
tobacco matters, probably explains their lower risk of 
lung cancer. The traditional taboos against smoking in 
South Asian women are holding, in contrast to earlier 
expectations and predictions. 40 The same is not true of 
men, particularly Bangladeshi and Pakistani men, 40 and 
there is evidence in Glasgow that the prevalence of 
smoking in school leaving age South Asian boys is 
similar to that in White boys. 72 Until recendy, in 
England and Wales, smoking cessation services were not 
accessed well by ethnic minority populations, though 
this has changed recendy. The situation in Scodand is 
unknown. Lung cancer in South Asian men is likely to 
converge towards the White Scottish population rate as 
implied by our analysis with country of birth as a covari- 
ate, as it has done in England and Wales. 

Prostate cancer is known to vary gready by ethnic 
group, with high rates in African origin (Black) popula- 
tions and low rates in South Asian groups. 13 73 We corro- 
borated these patterns, though the risk estimates for this 
ethnic group are imprecise. In a recent review, the age 
standardised incidence rate in the Black population in 
the UK PROCESS study was estimated at 166/100 000, 
three times higher than in the White Population, 258/ 
100 000 in the USA and 304/100 000 in Jamaica. Our 
estimate of 326.6/100 000 in African origin populations 
fits with these data. Additionally, our data suggest differ- 
ences between Indians and Pakistanis and low rates in 
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Chinese. We also noted low rates in White Irish, and 
high rates in other White British. The causes of these 
variations are unknown though the patterns have poten- 
tial to generate testable hypotheses. However, ethnic 
group variations in testing for prostate-specific antigen 
(PSA), and subsequent biopsy, are likely to be a major 
determinant of variations in the incidence of diagnosed 
cancer as implied by recent studies in Scodand and 
Ireland. 4 75 In contrast to our findings, prostate cancer 
rates are comparatively high in the Republic of Ireland. 
This may reflect higher rates of PSA testing and greater 
use of biopsies there. Biological understanding of such 
ethnic variations is limited, with current attention 
focused on genetics, hormones and fat, dietary factors 
including fatty acids, and vitamin D. 3 These ethnic var- 
iations provide a good model for disentangling causal 
hypotheses, for example, our findings do not support a 
major causal role for vitamin D, as the lowest rates are in 
populations with the lowest vitamin D levels, that is, 
South Asians. 6 One hypothesis currendy of interest is 
dietary factors such as lycopene in tomatoes being pro- 
tective. Valid dietary data across ethnic groups are few 55 
but tomatoes are integral to the preparation of many 
common meals in the South Asian cuisine. Table 6 indi- 
cates a higher level of fruit and vegetable consumption 
in all the non-White groups — which fits with the low risk 
of prostatic cancer in South Asian and Chinese men but 
not with the higher risk (though 95% CI includes 100) 
in the African origin groups. 

Since Scotland has high rates of cancer we would 
expect that non-White Scottish ethnic groups born in 
Scotland would have higher rates than their parents/ 
grandparents born abroad. Generally, adding country of 
birth led to modest narrowing of the risk difference, but 
in the age group developing cancers, relatively few 
non-White minority patients were born in Scotland. 
Unsurprisingly, the adjustment had most impact for lung 
cancer, as the major risk factor of smoking is socially pat- 
terned. It appears that the protection enjoyed by minor- 
ity groups may be sustained for some cancers across 
generations, and convergence may be slower than 
expected as indicated from studies in Europe. 3 
Definitive analyses will need to wait until the Scottish 
born ethnic minority populations have moved into the 
age groups where cancers are common. 

CONCLUSIONS 

Powerful calls have been made for the collection of data 

2 3 77 

by ethnic group and not by other proxies. ' The 
Scottish Health and Ethnicity Linkage Study has shown 
how to obtain national cancer statistics by ethnic group. 
The same methods could be applied wherever a popula- 
tion census or database records ethnic group, as in 
England and Wales, where the large numbers will 
permit a finer disaggregation of ethnic groups with the 
potential of incorporating important covariates such as 
religion, country of birth and social circumstances. The 



advantages over solely relying on NHS databases ' are 
a more reliable denominator and linked numerator 
data, longitudinal analysis of outcomes and access to 
relevant economic and social variables not available in 
NHS databases. The findings on all cancers, and specific 
cancers (particularly colorectal, prostate and breast), 
raise important questions on causation, and on public 
health and clinical policies. Risk factor data are required 
to help explain such variations better. Ideally, these 
would be collected within prospective cohort studies. We 
also need to find ways of linking risk factor data from 
other sources such as primary care. In the meantime, we 
need better and ongoing multiethnic cross-sectional 
health surveys across the UK to augment the 1999 and 
2004 Health Surveys for England. 39 40 The study contra- 
dicts the usual viewpoint that the health status of ethnic 
minorities is poor, at least for all-cancers and common 
cancers. The main public health lesson and challenge is 
for the majority population, for the 'Scottish effect' in 
relation to cancer does not apply across Scotland's 
ethnic groups. Can the White Scottish population 
change to enjoy the low rates of cancer seen in other 
ethnic groups in the country? Also, can the non-White 
groups avoid the high risks of cancer in Scotland across 
the generations? This exemplifies how the study of 
ethnic variations provides a public health approach with 
potential to benefit the entire population. 
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APPENDIX 1 DETAILS ON LINKAGE METHODS (TEXT 
LIGHTLY EDITED FROM OPEN-ACCESS PUBLICATION 23 ) 

Appendix figure 1 , republished from our open-access publication 23 
illustrates in concept how record linkage was based on information 
from three datasets: healthcare records, which include personal identi- 
fiers and clinical information; the CHI which contains personal identi- 
fiers and the CHI number; and the census file which contains 
personal identifiers and details of individuals' ethnicity. The 14 ethnic 
groups are given in appendix table A1. The CHI dataset lists in 
Scotland everyone registered with a general practitioner or eligible for 
NHS screening services and forms a unique identifier for NHS use. 
More than 99% of the Scottish population is estimated to be listed on 
the CHI. 

Date of birth, surname (using soundex codes to allow for variations 
in spelling), forename, address and full postcode, which were avail- 
able in both data sets, albeit not always recorded identically, were 
used to link the census number to the CHI. At this stage, other data 
fields in the two datasets were disconnected from identifying vari- 
ables. CHI and the census unique number were encrypted prior to 
linkage. A one-way cryptographic ('hashing') algorithm (currently 
impossible to reverse) was used to encrypt the CHI number. The 
census number was encrypted using an algorithm developed by 



Ccensus Database) 

Record Linkage 

CHI Number Personal Identifiers Personal Identifiers Census Nui 



Health Information 



4- 

k-up 

4- 



CHI Number Census Number 

(Look-up Table) 



Ethnicity Information 



Appendix figure 1 Overview of Record Linkage Process. 



Table A1 Linkage rates by ethnic group 



Number Percentage 



1 


White Scottish 


4290153 


95.3 


2 


Other White British 


357788 


93.6 


3 


White Irish 


47173 


92.2 


4 


Other White 


74655 


87.9 


5 


Any mixed background 


12117 


91.7 


6 


Indian 


13717 


89.9 


7 


Pakistani 


28538 


89.8 


8 


Bangladeshi 


1783 


88.0 


9 


Other South Asian 


5810 


85.1 


10 


Caribbean 


1659 


89.5 


11 


African 


4514 


86.5 


12 


Black Scottish or other Black 


1057 


89.1 


13 


Chinese 


15115 


87.4 


14 


Other ethnic group 


8945 


86.2 
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GROS. For the records deemed to be matches, 73.6% were exact 
matches. For the remainder, a probability matching process was per- 
formed. Here, the rate of false positives is critical. Methods have been 
developed to identify how false positives occur and what kind of strat- 
egies a human checker employs to decide whether a pair match is 
'good'. These decision strategies were built into a 'partitioning' com- 
puter algorithm. These 'partitions' then allow the allocation of effort to 
the most profitable 'partitions' which yield the lowest false-positive and 
highest true-positive rates. 

Once the linkage was completed personal identifying variables 
(such as names, address, postcode and dates of birth) were removed 
leaving a file with an encrypted CHI number and its corresponding 
encrypted census number (look up file). A census extract containing 
ethnic code (and limited other data including age, sex and indicators 
of socio-economic status) was joined to the above look-up file using 
the encrypted census number. The encrypted census numbers were 
then discarded leaving the ethnicity code, some other variables from 
the census, the encrypted CHI number and a newly generated index 
number unrelated to other numbers for the exclusive use of this 
project. The relevant parts of the ISD-linked database were linked via 
the encrypted CHI numbers. The encrypted CHI was replaced with an 
unrelated serial number (to keep together the multiple records on the 
same people), resulting in depersonalised clinical health records car- 
rying census ethnicity codes. Using methods previously described we 
estimated an upper limit to the false-positive linkage rate of 0.08%. 23 



APPENDIX 2 METHODS FOR CALCULATING RATES, RATIOS 
AND RRs 

To calculate DASRs we used the cohort denominator at April 2001 , 
and for the numerator the first event cancers for 7 years thereafter. 



We divided the result by 7 to get an annual rate. We had no informa- 
tion on emigration to recalculate denominators over time. Non-cancer 
outcomes were not available because of concerns over disclosure 
(see ethics and disclosure). We did not adjust the denominator to 
remove 50% of the people who developed cancer because the 
outcome is rare. For example, for all cancer in the White Scottish 
population the adjusted denominator would be 1 433 584 
-(0.5x71 094)=1 398 037, which is 97.5%. (It is standard practice to 
remove half of the numerator from the denominator when readjusting 
denominators in these circumstances. 32 ) The recalculated directly 
standardised rate is 726.5 compared with our reported figure of 708.5, 
a 2.5% difference, for our commonest outcome. The difference would 
be much smaller for the specific cancers. The effect on rate ratio and 
RRs would be very small, and less than this. Our approach has the 
merit of simplicity and is standard in descriptive epidemiology for rare 
outcomes 32 and has been adopted across SHELS analyses. The 
approach here — modelling cumulative incidence (risks) rather than 
person-time incidence — is appropriate when the numbers no longer 
at risk at the end of the observation period is not high (as here), when 
the period of observation is not highly variable (as here) and when the 
main comparisons are with a general population (as here). Szklo and 
Nieto's 32 established textbook notes that the cumulative incidence 
approach we have used leads to a lower absolute value for the inci- 
dence than with a person-time rate but when events are rare (as 
here) the discrepancy is small. 

We constructed Poisson models with age only and then included 
variables where we had a specific hypothesis; so there was no 
unspecified exploration (fishing), and no modelling with forward or 
backward selection to include as many significant cofactors as 
possible. 

With robust variance we mean the empirical (robust) estimator of 
the covariance matrix. It has the property of being a consistent 



Table A2 Number of cases and age-adjusted risk ratios (RR) for five overlapping time periods for all cancers (except for 
non-melanoma skin cancer) by ethnic group 



2001-2004 



2002-2005 



2003-2006 



2004-2007 



2005-2008 



N 



RR 



N 



RR 



N 



RR 



N 



RR 



N 



RR 



Men 
White Scottish 
Other White British 
White Irish 
Other White 
Any mixed background 
Indian 
Pakistani 

Other South Asian 
African origin 
Chinese 

Other ethnic group 
Women 
White Scottish 
Other White British 
White Irish 
Other White 
Any mixed background 
Indian 
Pakistani 

Other South Asian 
African origin 
Chinese 

Other ethnic group 



29719 


100 


29784 


100 


29358 


100 


29510 


100 


29392 


100 


2413 


87.6 


2432 


88.1 


2425 


89 


2427 


88.4 


2432 


88.8 


383 


92.1 


404 


97.6 


385 


95 


403 


99.6 


395 


98.7 


344 


89.8 


327 


86.3 


318 


86.1 


309 


84 


288 


79.2 


24 


95 


26 


102.1 


29 


114.6 


27 


105.1 


27 


104.4 


18 


34.7 


22 


41.3 


27 


50.1 


30 


53.9 


28 


49.1 


40 


52.2 


40 


50.5 


36 


44.7 


39 


46.7 


47 


54.5 


13 


58.3 


15 


64.9 


17 


71.9 


16 


64.9 


16 


62.5 


19 


92 


19 


90 


18 


84.1 


13 


58.8 


12 


52.7 


27 


60.3 


27 


58.3 


30 


63.7 


31 


63.4 


36 


71.3 


7 


40.7 


* 


* 


6 


31.9 


7 


35.1 


14 


66.3 


31535 


100 


31650 


100 


31854 


100 


32080 


100 


31926 


100 


2453 


90.5 


2527 


92.6 


2456 


89.2 


2465 


88.6 


2395 


86.1 


403 


88.7 


399 


87.9 


411 


90.5 


413 


90.9 


426 


94.8 


320 


79.1 


319 


77.6 


330 


79 


335 


78.6 


362 


84.3 


28 


89.2 


32 


99.7 


23 


70.1 


29 


85.9 


24 


70 


14 


34.3 


18 


42.3 


22 


49.7 


21 


45.3 


24 


50.2 


38 


59.4 


36 


53.5 


46 


65.1 


50 


66.9 


48 


61.7 


21 


107 


20 


99.1 


16 


77.1 


13 


60.5 


15 


68.2 


22 


106.3 


22 


102.4 


16 


71.9 


17 


73.1 


11 


45.9 


40 


85.6 


45 


92.5 


48 


95 


48 


90.7 


43 


78.8 


18 


71.4 


18 


67.9 


22 


79.1 


24 


81.3 


25 


81.2 



'Potentially disclosive so suppressed. 
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Table A3 Age-adjusted relative increase (%) in risk of cancer (and 95%CI) for each category increase in the variable for census-derived socio-economic variables 




Ethnic group 



SIMD+ 
Quantitative 



Highest 

qualification 

(individual)*' 

Quantitative 



Highest qualification NS-SEC 
(household) (individual) 9 



Quantitative 



Quantitative 



NS-SEC 
(household) 9 

Quantitative 



Car 

ownership 11 
1+ vs 0 



Household tenure 

Owned 

vs rented** 



Activity last week 
Working vs 
inactive 1 " 1 ** 



(A) Men 

White Scottish 

Other White British 

White Irish 

Other White 

Any mixed background 

Indian 

Pakistani 

Other South Asian 

Black 

Chinese 

All other ethnic group 

(B) Women 
White Scottish 
Other White British 
White Irish 

Other White 

Any mixed background 

Indian 

Pakistani 

Other South Asian 

Black 

Chinese 

All other ethnic group 



5 5 


'4 to 7)* 


9 9 


8 to 1 1 7) 


9 9 


\ 1 .3 IU 1 C..C-) 


7 9 




7 8 


(tz c tn 1 fl 1 ^ 

k 3.Q LU I U. I ) 


15 5 


(a 7 tn 91 Q\ 

O. 1 LU C 1 .CJj 


118 


o.o lu I / .y^ 


25 9 


to-\ +n P,\ 


4.5 


{9 8 tn fi 3^* 


5.5 


'3.4 to 7.7) 


7.2 


d 3 tn 


5.4 


'3 to 7.8)* 


6.8 




8 


19 ft tn 1*3 1 ^ 

C- . O LU 1 u. 1 1 


6.5 


1 .9 to 11) 


16.2 


1 O.O LU 1 \J J 


4.4 


(0.5 to 8)* 


5 


;0 to 9.9) 


6.5 


(3.4 to 9.4) 


9.8 


(1.8 to 18.4)* 


5.5 


(-1.3 to 12.9) 


12.7 


'4.4 to 20.4) 


10.1 


'3.5 to 16.1) 


24.8 


(14.1 to 34.2) 


4.4 


(0.4 to 8.2)* 


7.5 


:-0.7to 15.4) 


6.7 


(-5.5 to 17.5) 


14.6 


(7.5 to 22.1) 


7.3 


(-4.3 to 20.4) 


4.2 


(-11.1 to 17.4) 


-7 


(-10.9 to -3.3) 


20.8 


(7.7 to 32) 


12 


(-3.8 to 25.4) 


-15 


(-29.4 to -2.2) 


-9.2 


(-42.5 to 16.3) 


6.7 


(-26.3 to 54.5) 


35.8 


(3.5 to 78.1) 


28.9 


(17.1 to 39) 


8.5 


(-29.1 to 35.3) 


23.7 


(-54.6 to 62.4) 


2.5 


(-8.8 to 12.7) 


6 


(-20.2 to 26.7) 


19.6 


(0 to 35.3) 


2.7 


(-24.5 to 39.8) 


0.9 


(-24 to 34) 


-100.9 


(-328.9 to 5.9) 


33.4 


;9.3to 51.1) 


56.6 


(36.9 to 70.1) 


4.7 


(-8.8 to 16.6) 


8.5 


(-14.1 to 26.7) 


4.5 


(-11.4 to 18.3) 


22.4 


(-14.8 to 75.8) 


4.4 


(-13.8 to 26.5) 


7.4 


(-66.9 to 48.6) 


-17.8 


(-43.1 to 3) 


26.9 


(10.5 to 40.2) 


0 


(-13.5 to 12.1) 


8.5 


(-13.4 to 26.2) 


24.3 


(6.9 to 38.4) 


29.4 


(-7 to 80) 


11.3 


(-19.8 to 54.4) 


19.6 


(-26.1 to 48.7) 


-5 


(-70.2 to 35.1) 


47.2 


(-37.8 to 79.8) 


11.5 


(-13.5 to 31) 


-2 


(-57.3 to 33.7) 


-4.3 


(-48.6 to 21 .4) 


2 


(-18.6 to 28) 


-10.1 


(-35.3 to 24.9) 


14.7 


(-45.6 to 50) 


6.3 


(-39.3 to 37) 


25 


(-19.4 to 52.8) 


-2.2 


(-14.2 to 8.5) 


-17.8 


(-56.6 to 11.5) 


2.2 


(-19.1 to 19.7) 


-7.3 


(-38.1 to 38.9) 


-5.4 


(-24.9 to 19) 


35.2 


(22.2 to 46) 


34.4 


16.1 to 48.7) 


22.9 


(-30 to 54.3) 


12 


(-21 .5 to 36.3) 


-3.3 


(-37.6 to 22.5) 


-12.8 


(-130.6 to 44.8) 


-48.2 


(-77.5 to 19.2) 


-39.9 


(-68.9 to 16.3) 


-5 


(-171.1 to 59.3) 


-105.8 


-531 .6 to 33) 


-46.9 


(-292.6 to 45.1) 


4.7 


(3.9 to 5.4) 


6.3 


(4.5 to 8)* 


6.7 


(5.2 to 8.2)* 


3 


(0.5 to 5.9)* 


3 


(0.9 to 5.4) 


12.9 


(9.5 to 16) 


8.9 


(3.4 to 14.1) 


14.6 


(10.3 to 18.8) 


3.5 


(1.2 to 5.8)* 


4.5 


(2.5 to 6.7) 


7.2 


(5 to 9.3) 


0.2 


(-2 to 2.8) 


4 


(0.7 to 7.5)* 


15.5 


(9 to 21 .5) 


0.9 


(-8.8 to 9.7) 


7 


(2.3 to 1 1 .6) 


0.4 


(-3 to 3.8) 


-1 


(-10.9 to 8) 


-0.5 


(-10.2 to 8.4) 


-14.6 


(-24.1 to -4)* 


-4.5 


(-1 1 .6 to 3) 


7.3 


(0.5 to 13.5) 


8.4 


(-1.2 to 17.1) 


-0.4 


(-17.4 to 14.1) 


4 


(0.7 to 7.4)* 


7.4 


(4 to 10.7) 


8.4 


(Oto 16.2) 


0.9 


(-7.4 to 10) 


10.7 


(2.5 to 19.5) 


10 


(-1.4 to 20.1) 


-22.5 


(-40.3 to -7) 


4.5 


(-19.2 to 23.4) 


10.8 


(-2.9 to 22.6) 


6.8 


(-20 to 27.6) 


6.2 


(-21 to 27.2) 


23.5 


(-1 1 .4 to 72.3) 


11.7 


(-18.2 to 52.6) 


16.6 


(-17.4 to 40.7) 


19.6 


(-20.6 to 46.4) 


-7.5 


(-37.8 to 16.1) 


-11.6 


(-39.6 to 10.9) 


-20 


(-33.4 to -7.9) 


-18.2 


(-42.2 to 1 .7) 


-2.5 


(-17.6 to 15) 


7.9 


(-22.1 to 49.5) 


-13.5 


(-96.3 to 34.4) 


-20.4 


(-94.9 to 25.6) 


-44.1 


(-97 to -5.4) 


-3.3 


(-15.9 to 7.9) 


6.5 


(-41 .3 to 38.3) 


6.8 


(-10.2 to 21.1) 


-8.7 


(-24.5 to 10.4) 


0 


(-15.8 to 18.7) 


23.5 


3.5 to 39.4) 


-10.1 


(-97.8 to 38.7) 


-21.4 


(-1518 to 8.8) 


-14.1 


(-34.5 to 3.2) 


-9 


(-44.910 18.1) 


-5.5 


(-43.2 to 22.1) 


-5.5 


(-32.8 to 32.8) 


-21.8 


(-45.1 to 11.4) 


-65.6 


(-189.9 to 5.4) 


-44.8 


-107.9 to -0.7) 


24.6 


(-126.2 to 74.9) 


5 


(-25.6 to 28.1) 


-14 


(-44.5 to 10) 


0.2 


(-38.1 to 27.9) 


4.7 


(-26.9 to 49.9) 


-13.3 


(-34.4 to 14.6) 


-39.5 


(-166.4 to 26.9) 


-21.3 


(-105.6 to 28.4) 


-21.2 


(-89.3 to 22.4) 


8.2 


(-2.5 to 17.8) 


-1.8 


(-30.5 to 20.6) 


-16.2 


(-34.6 to -0.2) 


-12.8 


(-27.5 to 4.7) 


-3.9 


(-19.3 to 14.5) 


-17.4 


(-51 .9 to 9.3) 


1.5 


(-39 to 30.2) 


-38.2 


(-68.5 to -13.2) 


7 


(-4.5 to 17.4) 


0.2 


(-39.4 to 28.7) 


-73.2 


(-147.5 to -21 .2) 


2.9 


(-22.4 to 36.5) 


-0.2 


(-21 .2 to 26.3) 


3.3 


(-62.6 to 42.5) 


-40.9 


(-96.5 to -1) 


34.8 


(7.5 to 54.1) 



3 



'Trend of increase across categories shows a significant departure from linearity. 

fFigures are for each quintile increase in Scottish Index of Multiple Deprivation (SIMD). 

^Figures are for each category increase in highest qualification - that is from none to low and low to high. 

§Figures are for each category change in NS-SEC grouping, from N (never worked) to M (managerial and professional groups). 

TIFigures indicate difference in incidence between those who do not own cars and those who do. 

"Figures indicate difference in incidence between those who rent and those who own their house. 

ttFigures indicate difference in cancer between those who were inactive and those working last week. 

ttThe analyses were on all cancers at all ages, so the number of cases differs slightly from table 1 . Data Disclosure Committee ruled that publication of numerators here was not permissible as 
it would be potentially disclosive. 

NS-SEC, National Statistics Socio-Economic Classification. 
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estimator of the covariance matrix, even if the working correlation 
matrix is misspecified. Some relevant papers are: 

Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a 
generalized estimating equation approach. Biometrics 1988;44: 
1049-60. 

Royall RM. Model robust inference using maximum likelihood 

estimators. Int Statist Rev 1986;54:221-6. 

White H. Maximum likelihood estimation of misspecified 
models. Econometrka 1982;50: 1-25. 

We used SAS for our statistical analysis and the user documen- 
tation advises that if you include the statement 'REPEATED 
SUBJECT=.../TYPE=unstr;' that empirical (or robust) estimators 
are produced, even if you have only one observation per subject. 
The subject identifier needs also be put in the CLASS statement. 
We can supply the full computer code to interested readers. 



APPENDIX 3 MOVING AVERAGE ANALYSIS OF ALL- 
CANCER OVER TIME TO CHECK FOR EFFECTS OF 
CHANGING DENOMINATORS 

For text interpreting the results in table A2 see: Results section. 
All cancers without non-melanoma skin cancer and discussion 
(strengths and weaknesses). 

APPENDIX 4 ASSESSING THE POTENTIAL TO ADJUST 

FOR PUTATIVE CONFOUNDING VARIABLES 

The data in table A3(A) for men and table A3(B) for women show 
that none of the eight variables were consistently associated with 
cancer, that is, in the same direction of association. Mostly the 
variables were associated as expected (though not always with 
linear effects) in the White groups but less so in the non-White 
ethnic groups. For example, in men and SIMD (Scottish Index of 
Multiple Deprivation) the association varied widely across ethnic 
groups, from a decrease in cancer with increase in deprivation 
(-2.2%) to an increase in most groups, for example, 5.5% in 
White Scottish. In addition, SIMD did not show a linear increase in 
cancer with each category change in score (indicated by asterisk). 

Table A4 shows that for no variable was the direction of associ- 
ation the same in all ethnic groups. SIMD was closest (10/11 
times in men and 8/11 in women). However, our prior agreed def- 
inition for a valid confounding variable for the purposes of our ana- 
lysis was that the direction of association should be the same in all 
ethnic groups. The alternative would have been to exclude some 
populations from adjustment for confounders. However, there are 
two good reasons for not doing this (1) it would be against the 
general approach of examining across groups and would go 
counter to our prior analysis strategy, (2) the scientific literature 
generally shows that area-based measures are not consistent con- 
founders across ethnic groups. We concluded, therefore, that 
adjusting using these variables would be open to criticism. 
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